Isabella County
Field-wise Learning for Multi-field Categorical Data
We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in learning a complex model. In contrast, we propose a field-wise learning method leveraging the natural structure of data to learn simple yet efficient one-to-one field-focused models with appropriate constraints.
Bin2Vec: Interpretable and Auditable Multi-View Binary Analysis for Code Plagiarism Detection
Moussaoui, Moussa, Houichime, Tarik, Sadiq, Abdelalim
We introduce Bin2Vec, a new framework that helps compare software programs in a clear and explainable way. Instead of focusing only on one type of information, Bin2Vec combines what a program looks like (its built-in functions, imports, and exports) with how it behaves when it runs (its instructions and memory usage). This gives a more complete picture when deciding whether two programs are similar or not. Bin2Vec represents these different types of information as views that can be inspected separately using easy-to-read charts, and then brings them together into an overall similarity score. Bin2Vec acts as a bridge between binary representations and machine learning techniques by generating feature representations that can be efficiently processed by machine-learning models. We tested Bin2Vec on multiple versions of two well-known Windows programs, PuTTY and 7-Zip. The primary results strongly confirmed that our method compute an optimal and visualization-friendly representation of the analyzed software. For example, PuTTY versions showed more complex behavior and memory activity, while 7-Zip versions focused more on performance-related patterns. Overall, Bin2Vec provides decisions that are both reliable and explainable to humans. Because it is modular and easy to extend, it can be applied to tasks like auditing, verifying software origins, or quickly screening large numbers of programs in cybersecurity and reverse-engineering work.
- North America > United States > Michigan > Isabella County (0.06)
- North America > Canada > Ontario > Kingston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
BanglaASTE: A Novel Framework for Aspect-Sentiment-Opinion Extraction in Bangla E-commerce Reviews Using Ensemble Deep Learning
Islam, Ariful, Hossen, Md Rifat, Ahmed, Abir, Haque, B M Taslimul
Aspect-Based Sentiment Analysis (ABSA) has emerged as a critical tool for extracting fine-grained sentiment insights from user-generated content, particularly in e-commerce and social media domains. However, research on Bangla ABSA remains significantly underexplored due to the absence of comprehensive datasets and specialized frameworks for triplet extraction in this language. This paper introduces BanglaASTE, a novel framework for Aspect Sentiment Triplet Extraction (ASTE) that simultaneously identifies aspect terms, opinion expressions, and sentiment polarities from Bangla product reviews. Our contributions include: (1) creation of the first annotated Bangla ASTE dataset containing 3,345 product reviews collected from major e-commerce platforms including Daraz, Facebook, and Rokomari; (2) development of a hybrid classification framework that employs graph-based aspect-opinion matching with semantic similarity techniques; and (3) implementation of an ensemble model combining BanglaBERT contextual embeddings with XGBoost boosting algorithms for enhanced triplet extraction performance. Experimental results demonstrate that our ensemble approach achieves superior performance with 89.9% accuracy and 89.1% F1-score, significantly outperforming baseline models across all evaluation metrics. The framework effectively addresses key challenges in Bangla text processing including informal expressions, spelling variations, and data sparsity. This research advances the state-of-the-art in low-resource language sentiment analysis and provides a scalable solution for Bangla e-commerce analytics applications.
- Asia > Bangladesh (0.05)
- North America > United States > Michigan > Isabella County > Mount Pleasant (0.04)
- North America > United States > Texas > Irion County (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (2 more...)
- North America > United States > Virginia (0.04)
- Africa > Sudan (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (7 more...)
- Government (0.46)
- Information Technology (0.46)
- North America > United States > Michigan > Isabella County (0.43)
- Europe > Switzerland (0.04)
- Asia > Thailand (0.04)
- (4 more...)
7078971350bcefbc6ec2779c9b84a9bd-AuthorFeedback.pdf
We appreciate all reviewers' valuable comments, and greatly encouraged by the positive comments, e.g. the problem T o Reviewer 1 & 2 on missing recent related methods. We would include these results in supplementary materials of the final version. T o Reviewer 1: Q1 - The improvement of your method is not impressive. Our model improves Logloss by 0.002 and produces more Some columns contain site_id and ad_id so the dimensionality is very large. T o Reviewer 2: Q1 - The proposed method lacks technique contributions.
SARIMAX-Based Power Outage Prediction During Extreme Weather Events
Ye, Haoran, Sun, Qiuzhuang, Yang, Yang
This study develops a SARIMAX-based prediction system for short-term power outage forecasting during extreme weather events. Using hourly data from Michigan counties with outage counts and comprehensive weather features, we implement a systematic two-stage feature engineering pipeline: data cleaning to remove zero-variance and unknown features, followed by correlation-based filtering to eliminate highly correlated predictors. The selected features are augmented with temporal embeddings, multi-scale lag features, and weather variables with their corresponding lags as exogenous inputs to the SARIMAX model. To address data irregularity and numerical instability, we apply standardization and implement a hierarchical fitting strategy with sequential optimization methods, automatic downgrading to ARIMA when convergence fails, and historical mean-based fallback predictions as a final safeguard. The model is optimized separately for short-term (24 hours) and medium-term (48 hours) forecast horizons using RMSE as the evaluation metric. Our approach achieves an RMSE of 177.2, representing an 8.4\% improvement over the baseline method (RMSE = 193.4), thereby validating the effectiveness of our feature engineering and robust optimization strategy for extreme weather-related outage prediction.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.25)
- Asia > Singapore (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (4 more...)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving
Luo, Xuewen, Yang, Fengze, Ding, Fan, Gao, Xiangbo, Xing, Shuo, Zhou, Yang, Tu, Zhengzhong, Liu, Chenxi
Autonomous driving (AD) has achieved significant progress, yet single-vehicle perception remains constrained by sensing range and occlusions. Vehicle-to-Everything (V2X) communication addresses these limits by enabling collaboration across vehicles and infrastructure, but it also faces heterogeneity, synchronization, and latency constraints. Language models offer strong knowledge-driven reasoning and decision-making capabilities, but they are not inherently designed to process raw sensor streams and are prone to hallucination. We propose V2X-UniPool, the first framework that unifies V2X perception with language-based reasoning for knowledge-driven AD. It transforms multimodal V2X data into structured, language-based knowledge, organizes it in a time-indexed knowledge pool for temporally consistent reasoning, and employs Retrieval-Augmented Generation (RAG) to ground decisions in real-time context. Experiments on the real-world DAIR-V2X dataset show that V2X-UniPool achieves state-of-the-art planning accuracy and safety while reducing communication cost by more than 80\%, achieving the lowest overhead among evaluated methods. These results highlight the promise of bridging V2X perception and language reasoning to advance scalable and trustworthy driving. Our code is available at: https://github.com/Xuewen2025/V2X-UniPool
- North America > United States > Utah (0.05)
- North America > United States > Texas > Brazos County > College Station (0.05)
- North America > United States > Michigan > Isabella County (0.04)
- (3 more...)
- Information Technology (0.90)
- Transportation > Ground > Road (0.70)
- Transportation > Infrastructure & Services (0.47)
Extract-0: A Specialized Language Model for Document Information Extraction
This paper presents Extract-0, a 7-billion parameter language model specifically optimized for document information extraction that achieves performance exceeding models with parameter counts several orders of magnitude larger. Through a novel combination of synthetic data generation, supervised fine-tuning with Low-Rank Adaptation (LoRA), and reinforcement learning via Group Relative Policy Optimization (GRPO), Extract-0 achieves a mean reward of 0.573 on a benchmark of 1,000 diverse document extraction tasks, outperforming GPT-4.1 (0.457), o3 (0.464), and GPT-4.1-2025 (0.459). The training methodology employs a memory-preserving synthetic data generation pipeline that produces 280,128 training examples from diverse document sources, followed by parameterefficient fine-tuning that modifies only 0.53% of model weights (40.4M out of 7.66B parameters). The reinforcement learning phase introduces a novel semantic similarity-based reward function that handles the inherent ambiguity in information extraction tasks. This research demonstrates that task-specific optimization can yield models that surpass general-purpose systems while requiring substantially fewer computational resource.
- South America > Brazil > São Paulo (0.04)
- North America > United States > Michigan > Isabella County (0.04)
- Asia > Middle East > Jordan (0.04)
- Law (0.47)
- Government (0.46)
- Banking & Finance (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)